Goto

Collaborating Authors

 bfg update




Appendix

Neural Information Processing Systems

In this section, we provide further intuition about the proposed AdaQN method. In the next stage, with4m0 samples, we use the original Hessian inverse approximation 2Rm0(wm0) 1 and the new variablew2m0 for the BFGS updates. As Vn = O(1/n)(since n m0 = Ω(κ2logd)) and n = 2m, condition (38) is equivalent to (1/tn) tn (1/6.6). This parameter depends heavily on the variation/variance of the input features for linear models. Thus, we can focus on the diagonal components of these twomatrices only.




Sharpened Lazy Incremental Quasi-Newton Method

Lahoti, Aakash, Senapati, Spandan, Rajawat, Ketan, Koppel, Alec

arXiv.org Artificial Intelligence

We consider the finite sum minimization of $n$ strongly convex and smooth functions with Lipschitz continuous Hessians in $d$ dimensions. In many applications where such problems arise, including maximum likelihood estimation, empirical risk minimization, and unsupervised learning, the number of observations $n$ is large, and it becomes necessary to use incremental or stochastic algorithms whose per-iteration complexity is independent of $n$. Of these, the incremental/stochastic variants of the Newton method exhibit superlinear convergence, but incur a per-iteration complexity of $O(d^3)$, which may be prohibitive in large-scale settings. On the other hand, the incremental Quasi-Newton method incurs a per-iteration complexity of $O(d^2)$ but its superlinear convergence rate has only been characterized asymptotically. This work puts forth the Sharpened Lazy Incremental Quasi-Newton (SLIQN) method that achieves the best of both worlds: an explicit superlinear convergence rate with a per-iteration complexity of $O(d^2)$. Building upon the recently proposed Sharpened Quasi-Newton method, the proposed incremental variant incorporates a hybrid update strategy incorporating both classic and greedy BFGS updates. The proposed lazy update rule distributes the computational complexity between the iterations, so as to enable a per-iteration complexity of $O(d^2)$. Numerical tests demonstrate the superiority of SLIQN over all other incremental and stochastic Quasi-Newton variants.